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PERFECT SAMPLING USING BOUNDING CHAINS 

By Mark Huber 1 

Duke University 

Bounding chains are a technique that offers three benefits to 
Markov chain practitioners: a theoretical bound on the mixing time 
of the chain under restricted conditions, experimental bounds on the 
mixing time of the chain that are provably accurate and construction 
of perfect sampling algorithms when used in conjunction with pro- 
tocols such as coupling from the past. Perfect sampling algorithms 
generate variates exactly from the target distribution without the 
need to know the mixing time of a Markov chain at all. We present 
here the basic theory and use of bounding chains for several chains 
from the literature, analyzing the running time when possible. We 
present bounding chains for the transposition chain on permutations, 
the hard core gas model, proper colorings of a graph, the antiferro- 
magnetic Potts model and sink free orientations of a graph. 

1. Introduction. The breadth of applications using Monte Carlo Markov 
chain (MCMC) techniques today is astounding. From theoretical physics to 
approximation of $P-complete problems to Bayesian inference, all of these 
techniques rest on the ability to construct a Markov chain whose stationary 
distribution matches a particular target distribution whose normalizing con- 
stant is unknown. Many methods (such as Gibbs sampling and Metropolis- 
Hastings [15, 22]) exist for constructing such chains, but finding the mixing 
time of these chains is, in general, extremely difficult. Although some the- 
oretical progress on this problem has been made [5], for most chains of 
practical interest, heuristics, such as autocorrelation tests [11], are the only 
methods available. Such methods only give negative results — they can in- 
dicate when a Markov chain has not mixed, but do not guarantee that the 
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random variates obtained come from the correct distribution or anything 
remotely close to it. Estimates based upon such samples will necessarily be 
suspect. 

Bounding chains, when applicable, cut through this problem and give 
true algorithms for obtaining random variates. They are useful for MCMC 
in three ways. First, they are formed from a natural extension of couplings 
we shall refer to as complete couplings. When it is possible to analyze the 
bounding chain theoretically, this relationship to couplings allows derivation 
of a theoretical bound on the mixing time of the Markov chain based on well- 
known results. 

Second, bounding chains are themselves Markov chains that can be sim- 
ulated. Again because of the relationship to couplings, simulations of the 
bounding chain can give insight into the probability of coupling in the orig- 
inal chain and so experimental evidence can be obtained about the mixing 
time of the original chain. To be clear, this evidence is very different from 
that obtained from autocorrelation tests. Autocorrelation tests can only give 
evidence that a chain has not mixed, and the more complex the chain, the 
less value such tests have. Use of bounding chains can give direct evidence 
that a chain is mixing rapidly and the quality of this evidence can be ana- 
lyzed precisely. 

The third use is the most powerful: bounding chains allow a user to uti- 
lize perfect sampling algorithms such as the coupling from the past (CFTP) 
method of Propp and Wilson [23] or the method of Fill, Machida, Mur- 
doch and Rosenthal [10]. Perfect sampling techniques (also known as per- 
fect simulation methods) draw samples exactly from distributions where the 
normalizing constant of the distribution is unknown. Although both CFTP 
and [10] utilize Markov chains in their operation, they are true algorithms: 
there is no need to have an understanding of the mixing time of the un- 
derlying Markov chain in order to run these algorithms. At termination the 
variates generated come exactly from the target distribution. 

In this work we will present the basic framework for bounding chains, 
along with the primary method for their construction. This will then be 
applied to an introductory example, followed by more advanced applications 
drawn from statistical mechanics, graph theory and the approximation of 
jj-P-complete problems. 

2. Bounding chains. Bounding chains were introduced independently by 
the author [16] and Haggstrom and Nelander [14] . Here we present two basic 
ways of looking at bounding chains that arise naturally from consideration 
of couplings. 

Let M. be a Markov chain with state space O, kernel K and stationary 
distribution tt. Our goal is to generate random variates from it. The classical 
method of starting the Markov chain at a state xq and running for a large 
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number of steps requires some means for measuring how close the distri- 
bution is to stationarity, such as the total variation distance: ||K (xor) — 
7t(0||tv =sup A |K*(x , A) - tt(A)\. 

Let Mi and M.<i be two chains with kernels Ki and K2 and state spaces 
ill an d ^2- A coupling between Ai\ and M2 is a bivariate process (Xt,Yt) 
on x S^2 , where the marginal distribution of Xt is Markov with kernel Ki 
and Yt is also marginally Markov with kernel K2. 

Suppose we couple a chain with a copy of itself to give the bivariate 
process (Xt,Yt). We shall refer to this particular coupling as a simple, or 
pairwise, coupling. Let Xq = xq, Yq ~ n, and suppose that X t = Y t ^- X t i = 
Yf for all t' > t. Under this (very strong) condition we may employ the 
Coupling lemma [1, 7] to state: 

\\K\x ,-)-ir(-)\\ TY <P(X t ^Y t ). 

Perfect sampling protocols such as coupling from the past require a more 
global form of coupling. Suppose that we have a family of measurable func- 
tions from f2 to f2 and a probability distribution on the family such that for a 
draw (ft, P((p(x) £ A) = K(x, A) for all x £ Q and measurable A. If cfto, <f>i,... 
are a sequence of independent draws from this family of distributions, then 
for any xq € VL, xq, 4>q(xq) , 4>\{4>q{xq)) , . . . will be a simulation of the Markov 
chain starting from state xq. Of course, if we start from xq and yo using the 
same draws <j>o, 4>i, ■ . ■ , then we have a simple coupling satisfying the condi- 
tions of the Coupling lemma. In this paper we shall refer to such a family of 
functions and its distribution as a complete coupling. Let <3?o be the identity 
function and <I> t = 4>t-l $t-i for t > 1. Then if $t(fi) = {c}, that is, if ® t is 
a constant function, then every single state in the state space has coupled. 
This immediately gives us: 

||K*(xo, •) — 7t(-)||tv < P(&t is not constant). 

CFTP and related methods generate variates drawn exactly from the 
target distribution ir. Three elements are required: a Markov chain with 
7T as its stationary distribution, a complete coupling for the chain and a 
means of determining when $t is constant. CFTP computes &t f° r a fixed 
value of t. If <!>£ is a constant state x after these t steps, then x is output 
as the random variate. If &t is not constant, CFTP is called recursively 
to generate a random variate y. Then x = &t(y) is output as the random 
variate. As long as there is a reasonably large probability $j is a constant, 
the number of recursive calls within CFTP will be small. The output of this 
algorithm is distributed exactly according to it, the stationary distribution 
of the chain [23] . 

One effective means for determining complete coupling in order to use 
CFTP is to take advantage of a monotonic structure in the chain. We will not 
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go into the details here, other than to note that this idea was later extended 
to a wider variety of chains by introducing antimonotonicity [13]. Still, there 
remain chains of interest that are neither monotone nor antimonotone. 

Bounding chains are a method for determining when <3? 4 is constant for a 
wide variety of state spaces. The two forms of bounding chains we present 
are identical when dealing with discrete state spaces, but act differently 
when extended to continuous state spaces. 

Suppose that the chain M. with state space Q C C v for some set of dimen- 
sions V and some set of possible values C (we shall refer to C as the set of 
colors). Usually |0| is growing exponentially with |V|, which is why MCMC 
methods are commonly used in this situation. Consider a new Markov chain 
M.' with state space (2 C ) V , where 2 C is the set of subsets of C . Let X t be a 
stochastic process evolving according to M. and It be a stochastic process 
evolving according to Ai' . 

Definition 1 (Form 1). We will say that A4 f is a bounding chain for 
M if there exists a coupling between M' and M such that 

X t (v)EY t (v) Vv =^ X t+1 (v) eY t+1 (v) Vv. 

Each dimension v 6 V is given a single color from C in Xt and a subset 
of colors from C in Yj. If we have M.' bounding A4, we have a coupling 
for (X t ,Y t ) so that by starting with Xq(v) G Yq(v) for all v, we guarantee 
that Xt(v) E Yf(v) for all v £ V and all times t > 0. If Xt is evolving using 
00)01) •••) an d ^o bounds every state in 0, then when Yt bounds just one 
state x, that means that <&t is a constant function. The number of states 
bounded by Yt is at most n^l^i(' u )l) so to check that <&£ is constant, it 
suffices that |^t(^)| = 1 for all v. 

Given a complete coupling for M., generated by 0o, 0i, . . . , we can con- 
struct a bounding chain as follows. Say that x G C v is bounded by y G (2 C ) V 
(denote x £y y) if x(v) 6 y(v) for all v. Then given state (Xt,Y t ), let Xt+i = 
4>t(X t ) and let 

Y t+1 (v)= |J (&(x))(«). 

That is, consider each and every possible state x that is bounded by Yt. Take 
the next step from each of these states x and examine the set of colors for 
v that results. By taking Yt+\{v) to be this set of colors and doing this for 
every node v 6 V, we guarantee that Y t+ i bounds the state Xt+i- Starting 
the bounding chain in a state Yq that bounds Xq is easy when V and C 
are discrete: just make Yq{v) = C for all v € V. Examples of problems where 
Form 1 is the most useful formulation include proper colorings of a graph 
and the Potts model. 
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Sometimes it is more convenient to write the state space as , so 

if we think of a configuration as a function mapping V to C, instead keep 
track of the inverse of the function for each c G C. That is, for each color, 
we record the subset of dimensions that have that color. In this case, the 
bounding chain A4" will have state space {(6, d) : b, d G Q, b(c)nd(c) = Vc}, 
that is, ordered pairs of states from that do not intersect. Let At be a 
stochastic process evolving according to M and (Bt,Dt) a stochastic process 
evolving according to M" . 

Definition 2 (Form 2). We will say that M" is a bounding chain for 
A4 if there exists a coupling between A4" and M such that 

B t (c)CAt(c)QBt(c)UD t (c) Vc 

B f+ i(c)Ci w (c)C%( c )uA+i(c) Vc. 

Intuitively, for At bounded by (Bt,Dt), Bt(c) is the set of dimensions 
that are "known" to have color c in At, while Dt are those dimensions that 
"might" have color c in At- As with the other form, complete coupling has 
occurred when Dt is empty, which is very easy to check. 

We can construct a bounding chain in Form 2 using a similar method 
to the first form. Say that a is bounded by (Bt,Ct) if for every c G C we 
have Bt(c) C a C B t (c) U -Dt(c) and we write a Gc (-E>t> D t ). Given (j>Q,(j)i,... 
for a complete coupling, let A t +\ = (f)t(A t ), and for each c G C, set 

B t+1 (c)= n (<M«))( C ), 

ae (Bt,.Dt) 




Start the bounding chain in state (-E>o> A)) 5 where Bq(c) = and Dq(c) = V 
for all c G C. This will trivially bound Aq. 

Examples of problems where Form 2 is more useful include such chains 
as Swendsen-Wang [17] and independent sets of a graph [19]. 

3. Transposition chain for permutations. Our first application of the 
bounding chain technique is a toy example intended to illustrate several 
aspects of the method. While no practitioner would use this chain for gener- 
ation of random permutations, the techniques we use here will be applicable 
to more sophisticated examples later. Consider the problem of generating 
random permutations of n objects uniformly at random. Using our coloring 
scheme, is the subset of {1, . . . , n}^ 1 '---'^ such that x(v) ^x(w) Vu,w. If 
x(i) = j, we will say that item j is in position i. The Markov chain takes 
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steps by randomly moving between states that differ by a single transposi- 
tion. Specifically we set 



There are of course several ways to create complete couplings for this 
kernel. For instance, i and j could be chosen independently and uniformly 
at random from {1, ... ,71}, and then set xt+i(i) = %t(j) and %t+i(j) = %t(i)- 
This is equivalent to switching the items at position i and j. 

Alternately, items i and j could be swapped, so x t +i(x^ 1 (i)) =j and 
x t +i(xj l (j)) = i. Although these satisfy the definition of complete couplings, 
<3?j will never become constant for either of these approaches and the bound- 
ing chain derived from them will never detect complete coupling. 

A more useful complete coupling swaps item i with position j. To turn 
this complete coupling into a bounding chain, we just follow the procedure 
outlined in the previous section. We will utilize Form 1 (1) in this example. 
We begin with all the positions assigned state {l,...,n} in the bounding 
chain. As we run the bounding chain, some positions will move to singleton 
sets of colors, but some will have more than one color. Call a position known 
if it is assigned a set of size 1 in the bounding chain state and call it unknown 
otherwise. Similarly, call an item known if some position is known and as- 
signed that item, and call it unknown otherwise. To minimize the needed 
updating, each unknown position in the bounding chain will always be as- 
signed {1, . . . ,n} even though we could shrink the size of this set somewhat 
by considering the values of the known positions. 

Suppose our bounding chain is in state Yf and we choose to swap item 
i with position j. Then for any x bounded by Yt, cj>t{x){j) = i because we 
move item i to position j. Taking the union of {i} over all x bounded by Yf 
just gives Y t+1 (j) = {i}. 

To find lf+i(i), we must consider two cases. In the first case, i is known 
to have position i pos in Y t [so Y t (i pos ) = {i}]. In this case, (f>t(x)(i pos ) = x(j). 
Now the union over all x bounded by Y t of x(j) is a subset of Y t (j), so 
setting Y t+ i(i pos ) = Y t (j) insures that Yt+i bounds 4>t{x). 

In the second case, i has unknown position in Yt, so that the location of i 
in an x bounded by If is somewhere in the unknown positions in Yf. Hence 
the item at position j moves to one of these unknown positions at the next 
time step. But any unknown position k has Yt(k) = {1, . .. ,n} and so setting 
Y t+ i(k) = Y t+ i(k) for all k 7^ j makes sure that for x bounded by Yf, (j>t(x) 
is bounded by Y t+ \. 

Now we analyze the expected time needed for this bounding chain to 
detect complete coupling, that is, for all of the positions to become known. 




x = y, 
otherwise. 



x,y differ in two positions 



BOUNDING CHAINS 



7 



Lemma 1. Let r be the first time the bounding chain detects complete 
coupling. Then 



11 2 „> r>r . rn?\-l\ „-0.18C 



(1) E[r]<yn 2 and P(T>CE[r])<e 
for all C>2. 

Proof. Let Wt denote the number of unknown positions in the bound- 
ing chain at time t. We are interested in computing the expected time for Wt 
to hit 0. Suppose that item i and position j are selected for the switch. Item 
i and position j can each be either known or unknown. If at least one of i or 
j is unknown, then Wt is unchanged. If however, both i and j are unknown, 
then after one step item i (or equivalently position j) becomes known, Wt 
decreases by i. Since i and j were chosen uniformly and independently, the 
probability that both are unknown is just (Wt/n) 2 . Hence the expected time 
until Wt decreases by 1 is just (njWt) 2 and the expected time until Wt = 
is 

(2) V, -) <n 2 V^<n 2 ^. 

For the tail bound, note that the probability that T > 2E[r] is at most 1/2 
by Markov's inequality. So for each 2£/[r] time steps we have at least a 1/2 
chance of detecting complete coupling, so after CE[t] steps, the probability 
that we have failed is at most (l/2)^ c ^ for integer C . Accounting for the fact 
that C might be real and not an integer reduces our bound to (l/2)( c / 4 ) < 
exp(-0.18C). □ 

In this case, the expected time for the bounding chain to detect complete 
coupling comes within a constant factor of the expected time needed for 
any complete coupling from the class we are considering to couple even 
two processes. Suppose that {X t } and {Y t } are both evolving according to 
4>o, 0i, . . . and let r be the amount of time needed for coupling to occur. Note 
that Xt and Yt cannot differ in exactly one position and each step changes 
at most 4 positions. At some point before coupling occurs they differ in 2, 3 
or 4 positions. To finish the coupling, either Xt or Yt must change states and 
one of Xt and Yt must choose two positions where they differ. This occurs 
with probability at most 24/ro 2 and so the expected time to couple will be at 
least n 2 /24, differing from the bounding chain upper bound by a constant 
factor. 

While 0(n 2 ) time is needed for this type of coupling, the chain itself is 
known to mix in O(nlnn) time [6]. These special couplings do not always 
tightly bound the mixing time of a Markov chain, a weakness inherited by 
bounding chains. 
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4. The hard core gas model. An independent set of a graph is a subset 
of nodes such that no two nodes in the subset are connected by an edge 
of the graph. In this section we wish to generate random variates from the 
distribution 

\\A\ 

(3) < A ) = ^> 

where A is any independent set of the graph, A is a parameter of the model 
known as the fugacity and Z\ is the (unknown) normalizing constant often 
called the partition function of the model. 

In statistical physics, this distribution is known as the hard core gas 
model [25] and is a simple model of gases where each node of the graph 
might or might not be occupied by a gas molecule of nonnegligible size so 
that any two nodes connected by an edge cannot both be occupied simulta- 
neously. The fugacity controls the density of the gas. Despite its simplicity, 
the model exhibits a phase transition when the graph satisfies symmetry 
properties. For instance, the lattice Z d satisfies these properties. The same 
distribution also arises from certain stochastic models of communication 
networks [21]. 

Computation of the partition function for general graphs is a (jP-complete 
problem [8]. Because this is an example of a self-reducible problem, the 
ability to efficiently generate variates from this distribution immediately 
leads to a polynomial time approximation algorithm for finding Z\ [20, 24]. 

In [9], Dyer and Greenhill proposed a Markov chain for this distribu- 
tion that was neither the straightforward Gibbs sampler nor a Metropolis- 
Hastings algorithm, but instead a combination of the two that we shall 
refer to as the Dyer-Greenhill chain. They used path coupling to show that 
when A < 2/ (A — 2), where A is the maximum degree of the graph, the chain 
mixes in O(nhm) time. Their chain is neither monotonic nor antimonotonic. 
We present a complete coupling of their chain, develop the corresponding 
bounding chain and show the following: 

Theorem 1. When A < 2/ (A — 2), the bounding chain for the Dyer- 
Greenhill chain detects complete coupling after [log«n] + 9 steps with prob- 
ability at least 1 — (3 d , where 

AA 



(4) (3- 



2(A + 1)' 



The state of the chain at time t can be described by a single subset of 
nodes At that is the independent set. Using Form 2 gives us a bounding chain 
whose state is a single ordered pair (Bt, Dt). A regular Gibbs sampler chooses 
a node at random, and then adds or deletes that node with probabilities 
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that depend on A and whether neighbors of the node are in At. The Dyer- 

Greenhill chain adds a new wrinkle: if exactly one neighbor of the chosen 

vertex is in the independent set, the new node can "swap," where it is added 

to the independent set and the neighbor is deleted. This swap move occurs 

with probability p S wap> a new parameter of the chain. Let s £jj S mean that 

we choose an element s uniformly at random from the set S. Given a graph 

with node set {1, . . . , n}, the following is a complete coupling for their chain 

that takes a single step from state A: 

The Dyer-Greenhill chain for the hard core gas model 

1. Choose v Ejj {1, n} 

2. Choose U e v [0, 1] 

3. Case I: U > A/(A + 1) 

4. Let A <- A \ {v} 

5. Case II: U < A/(A + 1), and no neighbors of v are in A 

6. Let A^AU {v} 

7. Case III: U < _p S wa P A/ (A + 1), exactly one neighbor of v (call it w) is in A 

8. Let A <- A \ {w} U {v} 

Lines 1 and 2 are the random choice of (j) for this time step and the 

remaining lines merely compute 4>{A). For the bounding chain we choose v 

and U in the same way and then see how (B, D) changes based on those 

decisions. Using Form 2 (2), we wish to insure that BCiCBuD. 

The bounding chain considers six cases. Case I of the original chain corre- 
sponds to Case I below, where the state of the neighbors of v do not matter 
at all because we have rolled to remove v from the independent set. Case II 
corresponds to cases in the chain where no neighbors of v are in B and in 
Case III exactly one neighbor of v will be in B. 
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Bounding chain for the Dyer-Greenhill chain 




1. 


Choose vertex v €jj {1, . . . , n}, let iV„ be the neighbors of w 


2. 


Choose [/ Gy [0, 1] 




3. 


Case I: U > A/(A + 1) 




4. 


Let B <- B \ {«}, D^D\ {v}, 




5. 


Case Ha: B < A/(A + 1), \N V D B\ = \N V f)D\ = 




6. 


Let B <- B U {«}, D ^ D \ {v} 




7. 


Case lib: p S wa P A/(A + 1) < B < A/(A + 1), 1 7V„ n B| = 


o, |i\r„nD| = i 


8. 


Let B <- B \ {v}, D^D(j{v} 




9. 


Case lie: U < Pswap X/(X + 1), |JV„ n B\ = 0, |JV„ n D| = 


= i 


10. 


Let B <— B U {v}, D^D\{v}\{N v D D) 




11. 


Case lid: B < A/(A + 1), \N V nB| = 0, |iV„ n D| > 2 




12. 


Let B <- B \ {i;}, D^DU{v} 




13. 


Case Ilia: U <p swap A/(A+ 1), |JV w nB| = l, |^nL>| 


= 


14. 


Let B <- B U {«} \ (JVt, fl B), D <—D \ {v} 




15. 


Case Illb: C/ < p swap A/(A + 1), \N v nB\ = l, \N v nD\ 


> 1 


16. 


Let B <- B \ {v} \ (N v n B), B <- B U {v} U (JV„ n B 


) 



Note that Case II has four subcases. If no neighbors of v are in D as well, 
then the node is always added for any A bounded by (B, D). If two or more 
neighbors of v are in D, then v might be added or it might not, so we must 
add it to D. 



If exactly one neighbor of v is in D, then if we do not swap based on U, 
we could end up with v in the independent set or not for a state bounded 
by (B,D). If we roll to switch, however, it does not matter whether that 
neighbor is in or out of the independent set: either way it is out at the end 
of the step and v will be in the independent set. In terms of the bounding 
state, this allows us to remove the neighbor from D and add v to B. 

Finally, if there is more than one neighbor of v in D, there are cases 
where W6 cidd or do not add regardless of the value of Pswap? 

and so v must 

be added to D at the next step. 

Case III will deal with the subcases where exactly one neighbor of v is 
in B and we roll to swap. Note that if two or more neighbors of v are in B 
or if one neighbor of v is in B and we do not roll to swap, the state always 
remains unchanged and so we leave the bounding state unchanged as well. 

If no neighbors of v are in D, then the new state of v is completely 
determined by the neighbors of v that are all known. If at least one neighbor 
of v is in D, then if we do not swap v is always out, but if we do swap then 
v could be either in or out of the final state, and the neighbor of v could 
be in or out of the new independent set as well. This means that both must 
join D. 

Proof of Theorem 1. The bounding chain detects complete coupling 
when \Dt \ =0 and so we are interested in the behavior of \Dt\. In fact, we 
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will upper bound E[\D t +\ \ \B t ,D t ] . To make the analysis easier, suppose that 
we have a line between lines 1 and 2 where we remove node v from Dt. We 
did not do this in our description of the bounding chain simply because it is 
redundant if v is not in Dt and unnecessary if we end up adding v back to 
Dt. However, we can insert this line without changing the evolution of the 
bounding chain at all (which case we are in depends only on the neighbors 
of v — not v itself). 

This move decreases \Dt\ by 1 for all v € Dt, which happens with prob- 
ability \Dt\/n. The remaining lines have a chance of adding one or more 
nodes back to D t . 

If we are in Cases I, Ha or Ilia, we see that nodes are never added to Dt- 
Case lib is a bad case; v is added back to D t with probability (1 — p S wap)A/(A + 1) 
Case lie is a good case; a node other than v is removed from Dt with 
probability p swap \/(\ + 1). Case lid is another bad case; v is added to D t 
with probability A/(A + 1). Case 111b is the worst case: \Dt\ grows by 2 with 
probability p swap A/ (A + 1). Letting a = A/(A + 1), we have 

E[\D t+ i\\B t , D t ] = \D t \ - + ^1(1 -pswa P )a - ^-£>swa P a 

n n n 

N 02+ _^N 11+ 

H a + 2 Pswa P a, 

n n 

where Nqi is the number of nodes with \N V n B\ = and \N V fl D\ = 1, Nn + 
is the number of nodes with \N V ClB\ = 1 and \N V HD\ > 1, and A^o2+ is the 
number of nodes with \N V fl D\ = and \N V fl B\ > 2. 

At this point, we take advantage of our ability to choose p S wap- Setting 

Pswap 

= 1/4 means that 1 - 2p swap and 2p swap both are 1/2. Hence 



(5) E[\D t+l \\B u Dt} = \Dt\ + - 

n 



\ D t\ + x-^oia + -jNu+a + iVo2+a 



For a node to count towards Nqi or iV"n + , it must be adjacent to a node in 
Dt- Similarly, nodes counting towards ./V02+ must adjoin at least two nodes 
in Dt. Since A is the maximum degree of the graph, the number of such 
nodes is at most A|Z?t|. Substituting 

A r oi + A r n+ + 2iV 02+ < A|A| 

into (5) yields 

E[\D t+1 \\Bt,Dt] < \D t \[l - (1 -aA/2)} 
and taking expectations of both sides gives 

E[\D t+1 \]<E[\D t \][l-(l-aA/2)}. 
A simple induction using |Z?o| = n gives us 

£[|A|]<n[l-(l-aA/2)]*. 
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As long as 1 — aA/2 > or, equivalently, A < 2/(A — 2), this inequality will 
be a shrinking upper bound on E^D^]. Setting = [1 — (1 — aA/2)] gives 
us that after [log^n] + 9 steps, we will have S[|Z?t|] < (3 d and by Markov's 
inequality P{ j D t | > 1 ) < (3 6 . □ 

When (3 = 1, we could (using Martingale theory) show that the time until 
complete coupling is 0(n 2 ). Because such an analysis is of limited utility we 
omit it here. 

In [9], Dyer and Greenhill showed that a simple pairwise coupling will 
couple in O(nlnn) time when p swap = 1/4 using the technique of path cou- 
pling. This bounding chain not only provides a perfect sampling algorithm, 
it also serves as an independent proof of their result. 

5. Proper colorings of a graph. A proper coloring of a graph (V, E), 
where \V\ = n is an element x of {1, . . . , k} N such that for all edges {v, w} € 
E, x(v) x(w). 

Consider the Gibbs sampler chain for this problem. Given a proper color- 
ing x, let b x (v ) denote the number of different colors not used by neighbors 
of the node v. Then for configurations x and y that differ in color at exactly 
one node 



K(x,y) 



1 



nb x (v) ' 

Y — 

o, 



x. 



y differ at node v, 



otherwise. 



A bounding chain for this Gibbs sampler was presented independently 
by the author [16] and Haggstrom and Nelander [14]. The complete cou- 
pling that produces this bounding chain is as follows (here x is the current 
configuration) : 



The Gibbs sampler for proper colorings 

1. Choose v Ejj {1, . . . , n}, let N v be the neighbors of v 

2. Repeat 

3. Choose c {1, k} 

4. Until c<£x(N v ) 

5. Let x(v) c 



In other words, keep choosing colors for the chosen node uniformly at 
random from the set of colors until a color is found which does not match 
any of the neighbors of v. This will be the new color for v and is uniform 
over the set of colors not used by neighbors of v. 

We use Definition 1 for the bounding chain. Again we choose colors, but 
this time we might not know whether or not to reject the new color. If A is 
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the maximum degree of the graph, then after choosing A + 1 different colors 
we are guaranteed to have picked one that must be accepted. This logic can 
be codified in the following bounding chain. Let y be the current state. 



Bounding chain for the Gibbs sampler for proper colorings 

1. Choose v Ejj {1, . . . , n}, let N v be the neighbors of v, let y(y) 

2. Repeat 

3. Choose c <Ejj {1, k} 

4. If no neighbor w of v has y(w) = {c} 

5. Let y(v) <— y(v) U {c} 

6. UntUc^Utueiv^H or \v( v )\ > A 

7. Let x(v) *— c 



Theorem 2. When the number of colors k at least A(A + 2), let 

l-(A + l)A/[*-A + l] - 
n 

The probability that the bounding chain detects complete coupling in log^ n + 
steps is at least 1 — [3® . 

Proof. Let Wt denote the number of nodes v with > 1. We begin 

with Wq = n and wish to find the expected time until Wt = 0. Given Yt, we 
wish to find the expected value of Wt+i. As in the previous section, if we 
select a node v with > 1, we reduce Wt by 1 at that point. 

Now, if we select a color for v that lies within Yt(w) for some neighbor w 
of v with | If (to) | > 1, that could increase Wt by 1. The probability of this 
happening is at most [J2 w eN v \Yt(w)\l\Y t (v)\>i]/ (k — (A — 1)). (We subtract 
A — 1 from k in the denominator to account for neighbors with known colors 
that could be preventing some colors from being chosen.) In our bounding 
chain we always have < A + 1. Also Y^vJ2weN v ^-\Y t (v)\>l * s J us t t ne 

number of nodes adjacent to nodes with \Yt(v )| > 1 and so is bounded above 
by AWt- Putting this all together gives us 

v weN v 

W t Wt 
<W t - — + (A + 1)A^ 
n n 



W t 



l-(A + l)A/[fc-A + l] 
n 



so 



(7) 



E[W t+ i]=(3E[Wt\. 
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Using Wo = n, an induction gives us £7[Wt] = n. As in the previous section, 
application of Markov's inequality gives us P(Wt > 1) < l£[Wt] < /3*n and 
the result directly follows. □ 

The best results known for approximately uniform generation of proper 
colorings is a Markov chain of Vigoda [26] that mixes in polynomial time 
when k > 11 A/6. This requires far fewer colors in theory, but when k lies 
outside this bound the user is given no information. Even when k < A(A + 2) , 
a user can still attempt to use the bounding chain approach to generate 
variates, as (unlike the classical Markov chain method) it is a true algorithm. 
There is simply no a priori guarantee that the algorithm will terminate in 
polynomial time. 

6. The antiferromagnetic Potts model. The Potts model is an extension 
of the Ising model from statistical physics. Each node of a graph is assigned 
a color from {1, ...,k} in a configuration. The energy of configuration x 
iS H(x) = - J2v J2wGN v &x (v)=x(w)i where N v is the set of nodes adjacent to 
v. The probability of choosing a particular configuration is 



Here T is a positive parameter of the chain known as the temperature, J is 
either 1 or —1, and Zt is (as with the hard core gas model) known as the 
partition function. When J = 1 more weight is given to configurations with 
endpoints of edges colored the same way; this is known as the ferromagnetic 
Potts model. When J = — 1, configurations where endpoints of edges receive 
different colors are assigned higher weight: this is the antiferromagnetic Potts 
model. 

In the ferromagnetic case, monotonic chains for alternate formulations of 
this model exist [12, 23], but for the antiferromagnetic case no chain is known 
to be monotonic unless the graph is bipartite and k = 2. We describe two 
bounding chains for the antiferromagnetic Potts model based upon the Gibbs 
sampler. The Gibbs sampler chain chooses a node uniformly at random and 
then chooses a new color for the node according to ir conditioned on the 
value of the rest of the sample. 

For the antiferromagnetic Potts model, as the temperature goes to the 
weight moves towards proper colorings of a graph. For this reason, the uni- 
form distribution over proper colorings of a graph is also known as the 
antiferromagnetic Potts model at zero temperature. 

A bounding chain for the Gibbs sampler can be constructed by noting 
that each color has a minimum probability of being chosen, regardless of 
the values of the rest of the sample. This allows the bounding chain to be 
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moved to singleton sets with positive probability. This was the approach 
used in [16]. 

Unfortunately, this method has several disadvantages. First, it is time 
consuming to compute the minimum probabilities for each of the colors. 
Second, this approach does not result in a provably polynomial time algo- 
rithm even when k is large. Intuitively, positive temperature should make it 
easier to generate random variates, and so we should always be able to gen- 
erate samples in polynomial time when k > A(A + 1) for any temperature 
and with fewer colors as the temperature rises. 

We now present a complete coupling for the Gibbs sampler for the anti- 
ferromagnetic Potts model. For notational convenience, let 7 = exp(2/T). 

The Gibbs sampler for antiferromagnetic Potts model 

1. Choose v {1, . . . ,n}, let N v be the neighbors of v 

2. Repeat 

3. Choose c Gu {1, k} 

4. Choose U e v [0, 1] 

5. Let a c be the number of neighbors of v with color c in x 

6. Until U < 7 _ac 

7. Let x(v) <— c 

We extend this complete coupling to a bounding chain in the same way 
as the proper colorings chain: bound a c between b c and d c , which are the 
minimum and maximum number of neighbors of v with color c over the set 
of x bounded by y. 

Bounding chain for the Gibbs sampler for antiferromagnetic Potts model 

1. Choose v Gu {1, . . . , n}, let N v be the neighbors of v 

2. Let y(v) <- 

3. Repeat 

4. Choose c 6c; {1, k} 

5. Choose U G v [0, 1] 

6. Let b c be the number of w neighboring v with y(w) = {c} 

7. Let d c be the number of w neighboring v with c S y(w) 

8. If U < 7" & <= 

9. Let y(v) <- y(v) U {c} 

10. Until U < 7~ dc or \y(v)\ > A 



Theorem 3. Let r he the first time that the bounding chain detects 
complete coupling. Then for k > A(A + 2), and any temperature T, or for the 
case where A<k< A(A + 2) and T satisfies exp(2/T) < A 2 /[A 2 + 2A - 1], 
and finally for the case where k < A and T satisfies exp(2/T) < Ak/[Ak — 1], 
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there exists a constant (3 € (0, 1) such that 

(9) P(r > (-ln/J)nlnn + 0) < f3 e . 

Proof. When k > A(A + 2), the same argument used for the proper 
colorings chain can easily be extended to arbitrary temperatures T, when- 
ever we accept when T = 0, we will accept when T > 0. 

Suppose that A < k < A(A + 2). We assume the worst for Y, namely, 
that |lf(f)l = A + 1 for any v with |lf(f)l > 1. Let Wt denote the number 
of nodes with |^(v)| > 1. As in the previous section, we will proceed by 
considering 2£[Wt+i|Yi]. If > 1 and v is chosen, Wt is reduced by 1 

at the beginning of the step. In the other direction, we will say that node v 
causes Wt to be increased by 1 if some neighbor w of v is chosen, a color c 
in Yt(v) is chosen for that neighbor and we add c to Yt(w), but we do not 
exit the repeat loop. 

The expected value of Wt+i given Y t will just be Wt plus upper bounds 
on the expected increase to Wt minus the expected decrease in Wt- The 
expected decrease is just — W t /n. The expected increase caused by particular 
node w with |lt(io)| is always bounded above by (A + 1)A(1 — 7~ 1 )/[n(/c — 
A — 1)]. The factor of A is the maximum number of neighbors for w, the 
A + 1 factor the size of \Yt (w)\, the 1 — 7 -1 term is the maximum probability 
that a particular color in Yt(w) is the culprit for causing v to have a large 
color set, 1/n is the probability of choosing v, and l/(k — A — 1) is an upper 
bound on the probability that the first color for v that we do not reject 
based on the color of known neighbors is a particular color c. 

There are Wt such nodes w with |Yj(u;)| > 1, so 

Wt 

(10) E[W t +i \Y t ] <W t - — + Wt 

n 

Setting 02 = 1 - (l/n)[(A + 1)A(1 - 7~ 1 )/(fc - A - 1) - 1], an induction 
gives us 22 [W(] < fi\n and Markov's inequality, together with the fact that 
Wt is an integer, allows us to upper bound the probability that Wt > 0. 

In the final case k < A and we can make |lf(w)| = k. This changes the 
above argument in two places. First, we replace the factor of A + 1 by k. 
Second, we upper bound the probability of choosing a color by 1 rather than 
l/(k — A + 1). Hence in this case, 

Afc(l-7^ 1 ) ' 
n 

Setting fo = 1 - (l/n)[Ajfe(l - 7" 1 )] gives us E [Wt] < j3\n and the result 
follows in the same manner as above. □ 



(A + l)A(l- 7 - 
n(k-A-l) 



Wt 

(11) E[W t+ i \Y t ] <W t - — + W t 
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7. Sink free orientations of a graph. Given an undirected graph with 
node set TV and edge set E, an orientation of the graph is an assignment 
of direction to each of the individual nodes. A sink free orientation is an 
orientation such that every node has at least one outgoing edge, and so for 
this problem Q = {0, 1} E , where and 1 represent each of the two possible 
directions for each edge. In what follows we will assume without loss of gen- 
erality the graph is connected, since the number of sink free orientations on 
an unconnected graph is the product of the number of sink free orientations 
on each component. Moreover, we will assume that the degree of each node 
is at least two, since any leaf of the graph fixes the orientation of its adjacent 
edge outward and can be removed. As with the previous problems, count- 
ing the number of sink free orientations of a graph is JjP-complete [2] and 
self-reducible so that sample generation leads to an efficient approximation 
scheme. 

The first algorithm for generating almost uniformly from the sink free 
orientations of a graph (as long as the chain connects the state space) was 
the basic Gibbs sampler. This was shown to mix in 0([m 3 + mn ] ln(l/e)) 
time by Bubley and Dyer [2] using a simple coupling. Here e denotes distance 
of the variates from uniform distribution in total variation distance. 

The first perfect sampling algorithm [16] known for this problem was a 
bounding chain approach. This method ran in polynomial time, although 
a technical error in [16] makes the run time theorem proved there invalid. 
Here we reanalyze that chain with some small improvements to show poly- 
nomial running time. Because a new method for perfect sampling known as 
popping [4] will be faster than any coupling approach of this type, we do 
not attempt a tight bound on the running time, but rather concentrate on 
showing that it is polynomial. 

The Gibbs sampler chooses a random edge, and then chooses an orienta- 
tion for the edge uniformly from the set of orientations that do not create a 
sink. In what follows we will write the orientation of an edge {a, b} as (a, b) 
or (6, a) rather than or 1 to make clear the direction assigned. 

Gibbs sampler for sink free orientations of a graph complete coupling 

1. Choose {v,w} Ejj E 

2. Choose U Gu [0, 1] 

3. Case I: U < 1/2 and removing edge {w,v} does not make w a sink 

4. Let x({v, w}) <— (v, w) 

5. Case II: U > 1/2 and removing edge {w,v} does not make v a sink 

6. Let x({v, w}) <— (w, v) 

We will use Definition 1 for the bounding chain. As previously, we will call 
an edge {v,w} known if [^({u, w}) \ = 1 and unknown otherwise. Say that 
edge (a, b) leaves a, or its direction from a is outward, and it enters b or is 
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directed inwards towards b. There are several cases to consider in deciding 
what to do with chosen edge {v, w}. If there is a known edge leaving w and 
we choose (v, id), then the edge is known to be oriented in that direction. 
Similarly, a known edge leaving v allows us to orient the edge (w, v) with 
certainty. 

On the other hand, if all edges adjacent to w are known to enter w, then 
the only orientation possible for the edge is (w,v). Similarly, if all edges 
adjacent to v are known to enter v, the only possible orientation is (v,w). 
These cases result in the state of the chain (and therefore the bounding 
chain) remaining unchanged. 

Now suppose none of the above cases hold and, in addition, an edge 
adjacent to w is unknown. Then we do not know if the orientation (v,w) is 
valid or not: it depends on the state of the unknown edges. Hence if we try to 
change the orientation to (v,w), we must make Y({v,w}) = {(v,w), (w,v)}. 
Similarly, when none of the above cases hold and v is adjacent to an unknown 
edge, attempted moves to (w,v) result in {v,w} becoming unknown in Y. 



Bounding chain for Gibbs sampler for sink free orientations 

1. Choose {v, w} Go E 

2. Choose U e u [0,1] 

3. Case I: U < 1/2 and a known edge is leaving w 

4. Let y({v,w}) <-{(v,w)} 

5. Case II: U > 1/2 and a known edge is leaving v 

6. Let y({v,w}) <- {(w,v)} 

7. Case III: U < 1/2, no known edges leaves w, an unknown edge adj. to w 

8. Let y({v,w}) <- {(v,w), (w,v)} 

9. Case IV: U > 1/2, no known edges leaves v, an unknown edge adj. to v 
10. Let y({v,w}) <— {(v,w), (w,v)} 

Unfortunately, the bounding chain built directly from the complete cou- 
pling is worthless algorithmically. Suppose all of the edges in the state are 
unknown. No matter how many steps are taken in the bounding chain, the 
state of all edges will remain unknown. We need at least one known edge to 
start the bounding chain. 

This is accomplished by using three phases. Phase I consists of a single 
step in the Markov chain. Given that edge {v,w} was chosen and U < 1/2, 
one of two outcomes are possible. Either the edge is oriented (v,w) or the 
edge is oriented (w,v) and all the other edges adjacent to w are directed 
into w. These are the starting states Y 1 and Y 2 for two individual bounding 
chains. 

In Phase II we run these bounding chains using independent draws for 
lines 1 and 2. That is, we make different choices for the edge chosen and 
U for the {Y^} process and the {Y t 2 } process. Hopefully we end with each 
of the two bounding processes (independently) detecting complete coupling. 
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Finally, in Phase III the two states are run as a regular pairwise coupling just 
as Bubley and Dyer did in their original paper [2]. If at the end of Phase III, 
the two states have coupled, then we have achieved complete coupling. 

Formally, we have created a complete coupling on the bounded chain with 
a new state space that is the direct product of the Definition 1 state space 
and {7, II, 777} representing which phase we are in. We move from Phase I 
to Phase II automatically. We have to move from Phase II back to Phase I 
if either of our bounding process ever reaches the state where all edges are 
unknown. On the other hand, if both bounding processes detect complete 
coupling, we get to move on to Phase III. 

Since Phase III is the coupling of Bubley and Dyer [3] , we use their analysis 
to conclude that the expected time needed to finish Phase III is 0(m 3 + 
mn 3 ). Phase I always just takes a single step, and so we are left to analyze 
Phase II. 

Let Wt be the number of unknown edges at time t and consider the 
expectation of Wt+i given Yj. The key thing to note is that an edge can that 
go from known to unknown must be adjacent to at least 1 unknown edge. 
Say that edges {v, w} and {v, z] are changeable if one of two conditions hold. 
In condition (I), {v,w} is known to be (v,w) and edge is unknown, 

and no other edges are known to leave v. By picking {v,w} and attempting 
to direct it towards v, we might make this edge unknown and by picking edge 
{v,w} and directing it away from {v,w}, we might make it known. Hence 
the contribution to the expected change in Wt coming from this changeable 
pair is [l/(2j75|)](-l) + [l/(2|7?|)](l)=0. 

In condition (II), more than one edge is known to be leaving v and an 
edge {v,w} is unknown. If we pick this edge and choose the right direction, 
this edge becomes known. Hence Wt can only decrease from choosing this 
type of edge. Since the only edges that change Wt can be partitioned into 
condition (I) and (II) type edges, the expected value of Wt+\ given Y t is at 
most Wt- Thus the stochastic process Wo, W\, ... is a supermartingale. 

Suppose that there is a changeable pair. The probability that Wt changes 
at the next step is at least l/(2m). If no changeable pair exists, then we 
bide our time, running the chain forward until a pair becomes changeable. 
Since there are both known and unknown edges in Phase II, there exists a 
node a adjacent to both known and unknown edges. Let (b, a) be one of the 
known edges entering a (if any edge leaves a, we would have a changeable 
pair) . 

Call a node free if it is adjacent to at least two known outgoing edges. 
Note that any edge adjacent to a free node v can be oriented in any direction, 
moreover, if an edge (v,w) is switched to (w,v), then node w becomes free. 

Thus the free nodes take a random walk around the graph. There must 
be a free node connected to a by known edges. Starting at a, take a walk 
backwards along edges that enter the node we are currently visiting. Either 
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there is an incoming edge to our current node, in which case keep walking, 
or there are two or more outgoing edges, in which case we have found our 
free node. The number of nodes in the graph is finite, so eventually we must 
hit the same node twice, giving us our free node. 

On average the free node takes a step at least once every 2m moves. 
After 0(m 3 ) steps it will reach node b, at which point (b, a) has a chance 
of being reoriented (a,b), making the unknown edge adjacent to a together 
with (b, a) a changeable pair. The value of Wt then has a l/(2m) chance of 
changing. Hence the expected time between changes to Wt is bounded above 
by 0(m 5 ). 

Now, because Wt is a supermartingale, it has a 1/m chance of reaching 
before it hits m, given Wq < m — 1. We are running the two processes 
independently and so the probability that each hits before either hits m 
is at least 1/m 2 . The expected number of steps needed for such a random 
walk is at most m 2 and so altogether, the expected number of steps taken 
in Phase II is 0(m 2 ■ m? ■ m 5 ) = 0(m 9 ). 

Theorem 4. The expected number of steps needed to detect complete 
coupling for the sink free orientations bounding chain is 0(m 9 ). 

8. Extensions and open problems. There are several ways to extend the 
basic concept of bounding chains. For instance, V or C could be continuous 
rather than discrete. When V is continuous, the second form of bounding 
chains is more useful in this regard, but for C continuous the first form is 
more easily extended [18]. 

Another extension (both of bounding chains and methodologies such as 
CFTP) is to deal with more general couplings. All of the complete couplings 
we discussed in this work are memoryless, dealing with independent draws 
from (f>o,(pi, . . . . As we saw in the case of the transposition chain, these 
types of couplings are inherently limited and may not tightly bound the true 
running time of the chain. To deal with distributions such as uniform over 
linear extensions of a poset, it is necessary to use more general couplings. 

Bounding chains are a straightforward way for determining when a par- 
ticular complete coupling has in fact completely coupled. Their simplicity is 
offset somewhat by the fact that the complete coupling time obtained might 
not tightly bound the mixing time of the chain. Furthermore, no guidance is 
given for choosing a complete coupling that leads to a good bounding chain. 
Small changes in a complete coupling can lead to large differences in the 
behavior of the bounding chain. 

However, it is often possible for chains of both theoretical and practical 
interest to create a bounding chain that can be proven to detect coupling 
in polynomial time under certain conditions, and which runs quickly under 
more general situations. While the bounding chain idea is itself simple, it 
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raises a host of interesting theoretical and computational questions about 
how quickly it moves toward detection of coupling, some of which we were 
able to answer here, but many of which are still open. 
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